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Abstract 

A repeated network game where agents have quadratic utilities that depend on information external- 
ities - an unknown underlying state - as well as payoff externalities - the actions of all other agents 
in the network - is considered. Agents play Bayesian Nash Equilibrium strategies with respect to their 
beliefs on the state of the world and the actions of all other nodes in the network. These beliefs are 
refined over subsequent stages based on the observed actions of neighboring peers. This paper introduces 
the Quadratic Network Game (QNG) filter that agents can run locally to update their beliefs, select 
corresponding optimal actions, and eventually learn a sufficient statistic of the network's state. The QNG 
filter is demonstrated on a Cournot market competition game and a coordination game to implement 
navigation of an autonomous team. 

I. Introduction 

Games with information and payoff externalities are common models of networked economic behavior. 
In, e.g., trade decisions in a stock market, the payoff that a player receives depends not only on the 
fundamental (unknown) price of the stock but on the buy decisions of other market participants. Thus, 
players must respond to both, their belief on the price of the stock and their belief on the actions of other 
players |[2|. Similar games can also be used to model the coordination of members of an autonomous 
team whereby agents want to select an action that is jointly optimal but only have partial knowledge 
about what the action of other members of the team will be. Consequently, agents select actions that they 
deem optimal given what they know about the task they want to accomplish and the actions they expect 
other agents to take. 

Work in this paper supported by ARO W911NF-10-1-0388, NSF CAREER CCF-0952867, NSF CCF-1017454, and AFOSR 
MURI FA9550-10-1-0567. Part of the results in this paper have been submitted to ICASSP 2013 (Tl. 
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In both of the examples in the previous paragraph we have a network of autonomous agents intent on 
selecting actions that maximize local utilities that depend on an unknown state of the world - information 
externalities - and also the unknown actions of all other agents - payoff externalities. In a Bayesian setting 
- or a rational setting, to use the nomenclature common in the economics literature |[3| - nodes form a 
belief on the actions of their peers and select an action that maximizes the expected payoff with respect 
to those beliefs. In turn, forming these beliefs requires that each network element make a model of how 
other members will respond to their local beliefs. The natural assumption is that they exhibit the same 
behavior, namely that they are also maximizing their expected payoffs with respect to a model of other 
nodes' responses. But that means the first network element needs a model of other agents' models which 
shall include their models of his model of their model and so on. The fixed point of this iterative chain 
of reasoning is a Bayesian Nash Equilibrium (BNE). 

In this paper we consider repeated versions of this game in which agents observe the actions taken 
by neighboring agents at a given time. In observing neighboring actions agents have the opportunity to 
learn about the private information that neighbors are, perhaps unwillingly, revealing Q. Acquiring this 
information alters agents' beliefs leading to the selection of new actions which become known at the 
next play prompting further reevaluation of beliefs and corresponding actions. In this context we talk of 
Bayesian learning because the agents' goal can be reinterpreted as the eventual learning of peers' actions 
so that expected payoffs coincide with actual payoffs. This paper considers Gaussian prior distributions 
and quadratic utilities. For this type of problem we introduce the Quadratic Network Game (QNG) filter 
that agents can run locally to update their beliefs, select corresponding actions that maximize expected 
payoffs, and eventually learn a sufficient statistic of the network's state. 

The burden of computing a BNE in repeated games is, in general, overwhelming even for small sized 
networks [4]. This intractability has led to the study of simplified models in which agents are non-Bayesian 
and update their beliefs according to some heuristic rule ||5|-|[9|. A different simplification is obtained 
in models with pure information externalities where payoffs depend on the self action and an underlying 



state but not on the actions of others. This is reminiscent of distributed estimation |T0|-p9| since agents 
deduce the state of the world by observing neighboring actions without strategic considerations on the 
actions of peers. Computations are still intractable in the case of pure information externalities and for 



the most part only asymptotic analyses of learning dynamics with rational agents are possible |20|-|22|. 
Explicit methods to maximize expected payoffs given all past observations of neighboring actions are 
available only when signals are Gaussian [4] or when the network structure is a tree [23]. For the network 
games considered here in which there are information as well as payoff externalities, not much is known 
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besides asymptotic analyses of learning dynamics |24|-|26|. 

The specific setting considered in this paper is introduced in Section [II] Agents repeatedly play a game 
whose payoffs are represented by a utility function that is quadratic in the actions of all agents and an 
unknown real-valued parameter. At the start of the game each agent makes a private observation of the 
unknown parameter corrupted by additive Gaussian noise. At each play stage agents observe actions of 
adjacent peers from the previous stage that they incorporate into a local observation history which they 
use to update their inference of the unknown parameter, and synchronously take actions that maximize 
their expected payoffs. Actions that maximize expected payoffs with respect to local observations histories 
are defined as best responses to the expected actions taken by other agents. When the expected actions 
of other agents are also modeled as best responses with respect to their respective observation histories, 



we say that the network settles into a BNE (Section II- A i. 



In Section III we determine a mechanism to calculate BNE actions from the perspective of an outside 
clairvoyant observer that knows all private observations. For this clairvoyant observer the trajectory of the 
game is completely determined but individual agents operate by forming a belief on the private signals of 
other agents. We start from the assumption that this probability distribution is normal with an expectation 
that, from the perspective of the outside observer, can be written as a linear combination of the actual 
private signals. If such is the case, we prove that there exists a set of linear equations that can be solved 
to obtain actions that are linear combinations of estimates of private signals (Lemma [T}. This is then used 
to show that after observing the actions of their respective adjacent peers the probability distributions on 
private signals of all agents remain Gaussian with expectations that are still linear combinations of the 
actual private signals (Lemma [2]). We proceed to close a complete induction loop to derive a recursive 
expression that the outside clairvoyant observer can use to compute BNE actions for all game stages 
(Theorem [T]). 



In Section |IV| we leverage the recursion derived in Section to derive the QNG filter that agents can 
run locally, i.e., without access to all private signals, to compute their BNE action. Results in sections 



III and nV^ are generalized to the case of vector states and observations (Section ml. We apply the scalar 



QNG filter to a Coumot competition model (Section VI I and to the coordinated movement of a team of 



mobile agents (Section VII). 

Notation. Vectors v G R" are written in boldface and matrices A G ]K"x»" in uppercase. We use to 
denote all-zero matrices or vectors of proper dimension. If the dimension is not clear from context, we 
specify 0„xm- We use 1 to denote all-one matrices or vectors of proper dimension and l„xm to clarify 
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dimensions. We use to denote the ith element of the standard orthonormal basis of M" and := 1 — 
to write an all-one vector with the ith component nulled. 

II. Gaussian Quadratic Games 

We consider games with incomplete information in which N identical agents in a network repeatedly 
choose actions and receive payoffs that depend on their own actions, an unknown scalar parameter 6* G M, 
and actions of all other agents. The network is represented by an undirected connected graph G = {V, E) 
with node set V = 1, . . . , and edge set E. The network structure restricts the information available 
to agent i who is assumed to observe actions of agents j in its neighborhood n{i) := {j : {j,i} G E} 
composed of agents that share an edge with him. The degree of node i is given by the cardinality of the 
set n{i) and denoted as d{i) := #n{i). The neighbors of i are denoted ji^i <,...,< ji^d{i)- We assume 
the network graph G is known to all agents. 

At time t = agent i observes a private signal G M which we model as being given by the unknown 
parameter 6 contaminated with zero mean additive Gaussian noise e,, 

Xi = e + ei. (1) 

The noise variances are denoted as Cj := E [ef] and grouped in the vector c := [ci, . . . ,CAr]^ which 
is assumed known to all agents. The noise terms a are further assumed independent across agents. For 
future reference define the vector of private signals x := [xi, . . . , xat]^ G M^^^ grouping all local 
observations. 

Consider a discrete time variable t = 0,l,2,...to index subsequent stages of the game. At each stage 
t agent i takes scalar action aj(t) G M. The selection of agent i, along with the concurrent selections 
aj{t) of all other agents j £ V \ {i} results in a payoff Ui{ai{t), {aj{t)}j^y\i,6) that agent i wants to 
make as large as possible. In this paper we restrict attention to quadratic payoffs which for simplicity 
we assume to be time invariant. Specifically, selection of actions {aj = ai{t)}i,=v when the state of the 
world is 6 results in agent i experiencing a reward 

Ui{ai,{aj}j^v\i,0) '■= -^Oh + ^ Pijaittj + SaiO, (2) 

j(iV\i 

where Pij G M for all i G j G ^ \ i and 5 G M are real valued constants. Notice that since 
d'^Ui/daf = — 1 < 0, the payoff function in (|2]) is strictly concave with respect to the self action Oj of 
agent i. 
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Although the goal of agent i is to select the action ai{t) that maximizes the payoff in Q, this is 
not possible because neither the state 9 nor the actions {aj{t)}j^v\i known to him. Rather, agent 
i needs to reason about state 9 and actions {aj{t)}j(zv\i based on its available information. At time 
t = only the private signal Xi is known. Define then the initial information as /ij_o = {xi}. The 
information /ij o is used to reason about 9 and the initial actions {oj(0)}jgy\j that other agents are to 
take in the initial stage of the game. At the playing of this stage, agent i observes the actions a„(j)(0) := 
[ttj. J (0), ... , Uj. ^(^j (0)]-^ e K'^(«)xi of all agents in his neighborhood. These observed neighboring actions 
become part of the observation history /ij i = a„(-j)(0)} = |/io,i, a„(j)(0)} which allows agent i to 
improve on his estimate of 9 and the actions {aj(l)}jgy\j that other agents will play on the first stage 
of the game, thereby also affecting the selection of its own action aj(l). In general, at any point in time 
t the history of observations hi^t is augmented to incorporate the actions of neighbors in the previous 
stage, 

hi,t ■■= {/ij,t-i,a„(j)(t - 1)} = {2;i,a„(j)(n),n < t}. (3) 

The observed action history /ij t is then used to update the estimates of the world state 9 and the upcoming 
actions {aj{t)}j^v\i of other agents leading to the selection of the action ai{t) in the current stage 
of the game. 

The final components of the game that we introduce are the strategies cjj t that are used to map histories 
to actions. In this paper we focus on pure strategies that can be written as functions that map history 
realizations hi^t to actions ai{t) 

(^i,t ■■ hi^t ^ ai{t). (4) 

We emphasize the difference between strategy and action. An action a, (t) is the play of agent i at time t, 
whereas strategies dj^t refer to the map of histories to actions. We can think of the action ai{t) = ai^t{hi^t) 
as the value of the strategy function cTj^t associated with the given observed history hi^f Further define the 
strategy of agent i as the concatenation := {c7i^u]u=o,...,oo of strategies that agent i plays at all times. 
Use at := {ai^t}i&v to refer to the strategies of all players at time t, a^-t := {(Ju\u=o,...t to represent the 
strategies played by all players between times and t, and a := {(Tu\u=o,...,oo = {fijiev to denote the 
strategy profile for all agents i and times t. As in the case of the network topology, the strategy a 
is also assumed to be known to all agents. We study mechanisms for the construction of strategies in the 
following section. 
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A. Bayesian Nash equilibria 

Given that agent i wants to maximize the utility in (|2]) but has access to the partial information available 
in the observed history /ij ^ in ([3]), a reasonable strategy fTj ^ is to select the action ai{t) that maximizes 
the expected utiUty with respect to the history /ij t- To write this formally note that this expected utility 
depends on strategies (Tq-i-i played in the past by all agents and on strategies {(yj,t} j<^v\i that all other 
agents are to play in the upcoming turn. Fix then the past strategies a^-t-i and the upcoming strategies 
{^i.*}iey\« of other players and define the corresponding best response of player i at time t as 

(5) 



BRj,t(<7o;t-i, {o"i,t}jevV := argmax ^a„.t-i Ui{ai,{aj^t{hj,t)}jev\i,^) \hi,t 

The strategies aQ-_t-\ in (|5]l played at previous times mapped respective histories {hj^u\j(^v to actions 
{aj(u)}jgi/ for u < t. Therefore, the past strategies o"o:t-i determine the manner in which agent i updates 
his beliefs on the state of the world 6 and on the histories {hj^t}jev\i observed by other agents. As per Q 
the strategy profiles {o'j{t)}j^Y\i of other players in the current stage permit transformation of history 
beliefs {hj^t}j(^v\i i^ito ^ probability distribution over respective upcoming actions {a^ (i)}jgv'\i- The 
resulting joint distribution on {aj{t)}j^y\i and 9 permits evaluation and maximization of the expectation 
in (|5). 

One can think of the profiles {o'j{t)}j(zY\i played by other agents in the upcoming stage as the model 
agent i makes of the behavior of other agents. In that sense the sensible assumption is that other agents 
are also playing best response to a best response model of other agents. I.e., agent i assumes agent j 
is playing the best response to its respective model of the behavior of other agents and that the model 
agent j makes of these responses is that these agents also play best response to a best response model. 
This modeling assumption leads to the definition of Bayesian Nash equilibria (BNE) as the solution to 
the fixed point equation 

= BR^K:*-!, {fT*i}jeyv)> forall/ii,t, (6) 

where we have also added the restriction that an equilibrium strategy <J*^_^ has been played for all 
times u < t. We emphasize that (|6]l needs to be satisfied for all possible histories hi^t and not just for 
the history realized in a particular game realization. This is necessary because agent i doesn't know 
the history observed by agent j but rather a probability distribution on histories. Thus, to evaluate the 
expectation in Q agent i needs a representation of the equilibrium strategy for all possible histories hj^t- 
If all agents play their BNE strategies as defined in (|6]l, o"*^ becomes optimal in the usual game 
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theoretic sense. There is no strategy that agent i could unilaterally deviate to that provides a higher 
expected payoff than a*^ [cf. Q]. In that sense the BNE strategy is the best that agent i can do given 
other agents' strategies and his locally available information hi^f In the rest of the paper we consider 
agents playing with respect to the BNE strategy a*^ at all times. To simplify future notation define the 
expectation operator 

Em[ • ] := E..^_J • I hi,t], (7) 

to represent expectations with respect to the local history hi^t when agents have played the equilibrium 
strategy o-Q t-i earlier stages of the game. Similarly, we define the conditional probability distribution 
of agent i at time t given past strategies o"q.^_^ and his information hi^t as Pi.t(-) := Pa^,_^_-^ (• | 

Since Ui{ai, {aj}j(zv\iT 0) is a strictly concave quadratic function of m as per (|2]), the same is true of 
the expected utility Ej t [iij(aj, {o'j,t}jey\j, 6*)] that we maximize to obtain the best response in (|5]l. We 
can then rewrite Q by nulling the derivative of the expected utility with respect to Oj. It follows that 
the fixed point equation in ([6]) can be rewritten as the set of equations 

jev\{i} 

that need to be satisfied for all possible histories hi^t and agents i. Our goal is to develop a filter that 
agents can use to compute their equilibrium actions a*{t) := (y*t{hi,t) given their observed history hi^f 
We pursue this in the following section after some remarks. 

Remark 1 It may be of interest to modify the utility in ([2]) to include more additive terms that are 
functions of other actions {sLj}j^v\i and the state of the world but not of the self actions aj. This may 
change the utility and the expected utility in ([5]) but doesn't change the equilibrium strategy in Q. Since 
these terms do not contain the self action ai, their derivatives are null and do not alter the fixed point 
equation in ([8]l. 

Remark 2 The equilibrium notion in ([6]) is based on the premise of myopic agents that choose actions that 
optimize payoffs at the present game stage. A more general model is to consider non-myopic agents that 
consider discounted payoffs of future stages. Non-myopic behavior introduces another layer of strategic 
reasoning. Forward looking agents would need to take into account the effect of their decisions at each 
stage of the game on the future path of play knowing that other agents base their future decisions on what 
they have previously observed. E.g., non-myopic agents might reduce their immediate payoff to harvest 
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information that may result in future gains. Extensions to games with non-myopic agents is beyond the 
scope of this paper. 

III. Propagation of probability distributions 

According to the model in ([8), at each stage of the game agents use the observed history hi^t to estimate 
the unknown parameter 9 as well as the histories {hj^t}j(^v\i observed by other agents. They use the 
latter and the known BNE strategy {(^j t{^j,t)}j£V\i ^ form a belief Pi,t{W^{'t)}jev\i) o^i the actions 
{a'j{t)}j(zv\i of other agents which they use to compute their equilibrium action a*{t) at time t. Observe 
that if the vector of private signals x := [xi, . . . ,xn]'^ is given - not to the agents but to an outside 
observer - the trajectory of the game is completely determined as there are no random decisions. Thus, 
agent i can form beliefs on the histories {/ij,t}jey\j and actions {a*{t)}j(^v\i of other agents if it keeps 
a local belief Pj^f(x) on the vector of private signals x. A method to track this probability distribution 
is derived in this section using a complete induction argument. 

Start by assuming that at given time t, the posterior distribution Pj t(x) is normal. Recalling the 
definition of the expectation operator Ej_t[ • ] in Q, the mean of this normal distribution is Ej^t [x]. 
Define the corresponding error covariance matrix M^-^{t) G M^^^ as 



Mi,(t) := Bi^t (x - E,,t [x] ) (x - E,,^ [x] f 



(9) 



Although agent i's probability distribution for x is sufficient to describe its belief on the state of the 
system, subsequent derivations are simpler if we keep an explicit belief on the state of the world 9. 
Therefore, we also assume that agent i's beliefs on 9 and x are jointly Gaussian given history hi^f The 
mean of 9 is Ej ^ [9] and the corresponding variance is 

M'gg{t) := Ei,t [{9 - E,,t [9] ) {9 - E,,t [9] . (10) 

The cross covariance Mg^(t) £M}^^ between the world state 9 and the private signals x is 

M'g^it) := Ei,t [{9 - E,,t [9] ) (x - E,,^ [x] . (11) 

We further make the stronger assumption that the means of this joint Gaussian distribution can be written 
as linear combinations of the private signals. In particular, we assume that for some known matrix 

Li^t G M^^^ and vector G K^""^ we can write 

Bi^t [x] = L,,tx, B,,t [9] = kf,x. (12) 
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Observe that the assumption in (12i is not that the estimates Ej t [x] and Ej ^ [9] are computed as Unear 
combinations of the private signals x - indeed, x is not known by agent i in general. The assumption is 
that from the perspective of an external observer the actual computations that agents do are equivalent 
to the linear transformations in ([12]). 

Under the complete induction hypothesis of Gaussian posterior beliefs at time t with expectations as 
in ( [T2] ), we show that agents play according to linear equilibrium strategies of the form 



(13) 



for some action coefficients Vj_t G M^^^ that vary across agents but are independent of the observed 
history /ij t. These can be found by solving a system of linear equations. We do this in the following 
lemma. 



Lemma 1 Consider a Bayesian game with quadratic utility as in ([2]). Suppose that for all agents i, the 
joint posterior beliefs Pj^t([^5 x-^]) on the state of the world 6 and the private signals x given the local 
history hi^t cit time t are Gaussian with means expressed as the linear combinations of private signals in 
(|12|) /or some known vectors kj ^ and matrices Li^f Define the aggregate vector := [k^^, . . . , k^ G 
j^Af^xi sfQf^j^ifig f/jg state estimation weights of all agents and the block matrix Lt G JJ^^x^^ with N xN 



diagonal blocks {{Lt))ii = LJ^ and off diagonal blocks {{Lt)) 



( 



Lf.-- 



(14) 



If there exists a linear equilibrium strategy as in ( |13| ), the action coefficients := [v^j, . . . , ^ G 
can be obtained by solving the system of linear equations 



Ltwt = (5kt 



(15) 



Proof: We hypothesize that agents play according to a linear equilibrium strategy as in ([13 1. 
Substituting this candidate strategy into the equilibrium equations in ([8|l yields 



jey\{i} 



(16) 



The summation in ( [T6] ) includes the expectations Ej t[Ej Jx]] of agent i on the private signals' estimate 



of agent j. As per the induction hypothesis in ( [T2| ), we have that the inner expectations can be written 
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as Ej^j[x] = Lj,tx. Using this fact, agent i's expectation of agent j's estimate of private signals becomes 



(17) 



Substituting (Hi and the estimate induction hypotheses in (22) for the corresponding terms in ([T6]l and 



(17 1, and reordering terms yield the set of equations 



j€V\{i} 



(18) 



At this point we recall that the equilibrium equations in ([8]l are true for all possible histories /ij Therefore, 
the equilibrium equations in ( [TS] ), which are derived from ([8]l, have to hold irrespectively of the history's 
realization. This in turn means that they will be true for all possible values of x. This can be ensured by 
equating the coefficients that multiply each component of x in ([T8|) thereby yielding the relationships 



j€V\{i} 



(19) 



that need to hold true for all agents i. The result in (15 1 is just a restatement of (19 1 with the latter 



corresponding to the i-th block of the relationship in ( [T5| ). ■ 
Lemma [T] provides a mechanism to determine the strategy profiles cr*((-) of all agents through the 
computation of the action vectors Vj t as a block of the vector that solves ( fTS] ). We emphasize that 
the value of the weight vector in ^T5h does not depend on the realization of private signals x. This 



is as it should because the postulated equilibrium strategy in ( [T3| ) assumes the action weights Vj ^ are 
independent of the observed history. A consequence of this fact is that the action coefficients {vj 
of all agents can be determined locally by all agents as long as the matrices Lj ^ and vectors Vj j are 
common knowledge. The equilibrium actions a*{t), however, do depend on the observed history because 
to determine the action a*{t) = a*^{hi^t) = "^Tt^iA^] multiply vf^ by the expectation Ej([x] 



associated with the actual observed history /ij ^ See Section IV for details 



At time t agent i computes its action vector Vj ^ which it uses to select the equilibrium action a* (t) = 
v^^Ejt[x] as per (13l. Since we have also hypothesized that Ej^ [x] = Lj^x, as per ( [T2] ) the action of 
agent i at time t is given by 

a,(t) = yrLli^t^. (20) 



We emphasize that as in ( [12] ) the expression in ( [20] ) is not the computation made by agent i but an 
equivalent computation from the perspective of an external omniscient observer. 



February 4, 2013 



DRAFT 



11 



The actions a„(j)(t) := [aj^ ^{t), . . . , aj. ^^^^ {t)Y G M'^(*)^i of neighboring agents j G n{i) become part 
of the observed history hi^t+i of agent i at time t + \ [cf. Q]. The important consequence of ( |20| ) is 
that these observations are a linear combination of private signals x. In particular, by defining the matrix 



; ,L,-, E M'^»x^ we can write 



j^.d(i),t'^]^.d{^}, 



X. 



(21) 



Agent z's belief of x at time t is normally distributed; moreover, when we go from time t to time t + 1, 
agent i observes a linear combination, a„(j) {£) = -fff^x, of private signals. Thus, the propagation of the 
probability distribution when the history hi^t+i incorporates the actions a„(j)(t) is a simple sequential 
LMMSE estimation problem pT} Ch. 12]. In particular, the joint posterior distribution of x and 9 given 
hi^t+i remains Gaussian and the expectations Ej^t+i [x] and Ej^f+i [6\ remain linear combinations of 
private signals x as in (12 1 for some matrix Li^t+i and vector kj ^+i which we compute explicitly in the 
following lemma. 

Lemma 2 Consider a Bayesian game with quadratic utility as in Q and the same assumptions and 
definitions ofLemma^ Further define the observation matrix Hf^ := [vj ^ ^Lj. ^^j; . . . ; ^^.^ ^Lj. ^^^^ G 
^d{i)xN ^ j/jg LMMSE gains 



(22) 
(23) 



and assume that agents play the linear equilibrium strategy in ( 13 1. Then, the beliefs Pj t+i ([0, x-^]) after 
observing neighboring actions at time t are Gaussian with means that can be expressed as the linear 
combination of private signals 



Ei,i+i [x] = Li,t+ix, Ei,i+i [9] = k^t+ix, 



(24) 



where the matrix Li^t+i cind vector kj t+i are given by 



Li,t+i — Li^t + Kit) \^Hi ^ - H^ ^Li^t 

^It+i = ^It + Km - HlU,t) ■ 



(25) 
(26) 
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The posterior covariance matrix M^^{t +1) for the private signals x the variance Mgg(t + 1) of the 
state 9 and the cross covariance Mg^^t + 1) are further given by 

MUt + 1) =MUt) - Kit)Hj;,MUt), (27) 
M^,(t + 1) =M^,(t) - KlitfHlMlsit), (28) 
Ml^{t + 1) =Ml^{t) - KimltMUt)- (29) 

Proof: Since observations of i, are linear combinations of private signals x which are 

normally distributed, observations of i are also normally distributed from the perspective of i. Furthermore, 
by assumption ( [12] ), the prior distribution Pj t(x) is Gaussian. Hence, the posterior distribution, Pi^t+i(x), 
is also Gaussian. Specifically, the mean of the posterior distribution corresponds to the LMMSE estimator 
with gain matrix Ki^{t) = M^^{t)H^{Hl^M^^{t)Hi^ty^ ; that is, 

Ei,i+i[x] =E,,t [x] + i^i(t)(a„(i)(t) - E,,t[a„(i)(t)]). (30) 

Because 9 and x are jointly Gaussian at time t, 9 and a„(j) (t) are also jointly Gaussian. Therefore, 
the posterior distribution Pj t+i(0) is also Gaussian. Consequently, the Bayesian estimate of 9 is given 
by a sequential LMMSE estimator with gain matrix = M'g^{t)Hi^t{Hlt^iA't)Hi,ty^ , 

Ei,t+i [9] =Bi,t [9] + {Sir^^i^it) - Bi,t [^n(^){t)]) • (31) 



Given the linear observation model in ( [21] ), agent i's estimate of his observations at time t is given by 
Ej^f (a„(j)(t)) = H^^Ei^t[^]. Substituting ( [T2| ) for the mean estimates at time t in ( [30[ ) and ( [3T] ), we obtain 



E,,t+i [x] = L,,tx + Kiit) (F^iX - Hj^^tL^^t^) , (32) 
E,,i+i [9] = k^iX + Klit) [Hl^^ - H^U^t^) . (33) 

Grouping the terms that multiply x on the right hand side of the two equations, we observe that 
Ej^i+i [x] = Ljt_|_ix and Ej^+i [9\ = \<i[^^^-x. where Li^t+i and li-i^t+i are as defined in (25l and (26 1. 



Similarly, the updates for error covariance matrices are as given in ([27)-([29[) following standard LMMSE 
updates |[27| Ch. 12]. ■ 
In the repeated game we are considering, agents determine optimal actions given available information 
and determine the information that is revealed by neighboring actions. These questions are respectively 
answered by lemmas [T] and [2] under the inductive hypotheses of Gaussian beliefs and linear estimates as 
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per {12). The answer provided by Lemma |2] also shows that the inductive hypotheses hold true at time 



t + 1 and provides an exphcit recursion to propagate the mean and variance of the beliefs posterior to 
the observation of neighboring actions. This permits closing the inductive loop to establish the following 
theorem for recursive computation of BNE of repeated games with quadratic payoffs. 

Theorem 1 Consider a repeated Bayesian game with the quadratic utility function in Q and assume 
that linear strategies a*^{hi^t) = ^ft^i,t[^] ( |13| ) exist for all times t. Then, the action coefficients 

Vj t can be computed by solving the system of linear equations in (15 1 with := [v^j, . . . , J"^, 



:= [k^j, . . . , k^^]-^ and Lt as in (14). The matrices Li^t <^nd the vectors kj.t are computed by 



recursive application of (|22|)-(23 1 and (25l-(29l with initial values 



Lift = lef , hi ft = ei. (34) 

The initial covariance matrix -M^x(O), initial variance Mgg(0), and initial cross covariance Mg^(O) are 
given by 

Mi,(0) = diag(ei)diag(c) + e,ef Ci, M'Uo) = Ci, MJ,(0) = c,ef . (35) 

Proof: See Appendix |A] ■ 
According to Theorem [T] the beliefs on 9 and x remain Gaussian for all agents and all times when 
agents play according to a linear equilibrium strategy as in (13 1 at each stage. Theorem [T] also provides a 



recursive mechanism to compute the coefficients Vj ^ of the linear BNE strategies o"*^(/ij = v^^Ej t[x] 
and the coefficients Li^t and kj^f that determine the LMMSE estimates as per ([12]). However, these latter 
expressions cannot be used by agent i to calculate estimates Ej ^ [x] and Ej t [6] unless the private signals 
X are exactly known, which will absolve agent i from responsibility of the estimation process entirely. 
Since the BNE action a*{t) = a*^{hi^t) = ^ft^i,t[^] depends on having the observed private signal 
estimate Ej ^ [x] available, Theorem [T] does not provide a way of computing the optimal action either. 
This mismatch can be solved by writing the LMMSE updates in a different form as we show in the next 
section after the following remark. 

Remark 3 Results in this paper assume the system of linear equations in ( [T5] ) has a unique solution. If 
the solution is not unique, a prior agreement is necessary for agents to play consistent strategies. E.g., 



agents could agree beforehand to select the vector with minimum Euclidean norm. If (15 1 does not 



have a solution, it means that the equilibrium strategies of the form in (20 1 do not exist. A sufficient 
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X 



MUt) 



Kt 



{Lj^t}j(zv 



ai{t) 



Fig. 1. Quadratic Network Game (QNG) filter at agent i. There are two types of blocks, circle and rectangle. Arrows coming 
into the circle block are summed. The arrow that goes into a rectangle block is multiplied by the coefficient written inside the 
block. Inside the dashed box agent i's mean estimate updates on x and 9 are illustrated (cf. and l|37^). The gain coefficients 
for the mean updates are fed from LMMSE block in Fig. 2. The observation matrix Hi^t is fed from the game block in Fig. 
2. Agent i multiplies his mean estimate on x at time t with action coefficient Vi,t, which is fed from game block in Fig. 2, to 
obtain ai{t). The mean estimates Ei,t[x] and aiit) can only be calculated by agent i. 



condition for this not to happen is to have a strictly diagonally dominant utility function which in explicit 
terms we write Yl,j^v\{i) l/^^il ^ ^- ^^^^ ^^^^ Gershgorin's Theorem implies that Lt is full rank because 



it has no null eigenvalues. Laxer conditions to guarantee existence of linear equilibria as in ( [20| ) can be 
found in, e.g., | [28| , | [29| . In all of our numerical experiments solutions to ([15) exist and are unique. 



IV. Quadratic Network Game Filter 
To compute and play BNE strategies each node runs the quadratic network game (QNG) filter that 



we derive in this section. Since agent i cannot use (12i, we need an alternative means of computing 



estimates Ej ^ [x] and Ej ^ [9\. To do this refer to the transformation of (30 1 and (31_l into (32) and ( [33] ) 
in the proof of Lemma |2j In this transformation we substitute the observed neighboring actions a„(j)(t) 
for their model a„(j)(t) = HJ^k. and write the expectation of these actions as if^^^^Ej ^[x] with the further 
substitution Ej t [x] = L^^fX. As a result we can rewrite (30 1 and (31 1 as 



Ei,t+i[x] = Ei,t [x] + i^i(t)(a„(,)(t) - Hl.^i^^]), 
^i,t+i[e] = E,,t [6] + Kl{t) (a„(,)(t) - Hl^^iM)- 



(36) 
(37) 



The updates in ( [36[ ) and ( [37] ) can be implemented locally by agent i since they depend on the previous 
values Ej.t[x] and Ej f[^] of the LMMSE estimates, and the observed neighboring actions a„(j)(t). They 
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can be combined with the coefficient recursions in ( [TS] ), p2])-p3|), and (25 l-(|29l) as well as with the 



BNE strategy expression in ( [13] ) to recursively compute the equiUbrium actions a* (t) given the observed 
history /ij j. 



The updates in ([13]), <\Y5}, ([22])-([23]), (|25|)-([29]), and ([36])-([37]) form the QNG filter. In the QNG filter 
agent i performs a full network simulation in which it maintains a belief P,; (([0,x^]) on the state of 
the world 9 and the private signals x of all agents. This implies performing the coefficient updates ( [T5] ), 



([22])-([23[), ([25])-(|29]) for all agents in the network. This he can do because the network topology and 
private signal models are common knowledge. The updates ( [T3] ) and ([36])-([37]) are performed for agent 
i's own index only. 

The signal updates on ( 36 )-([37]) are illustrated inside the dashed box in Fig. [T] At time t, the inputs to the 
filter are the observed actions a„(j)(f) of agent i's neighbors. The prediction Ej^j[a^(j)(t)] = //j_tEj^t[x] 
of this vector is subtracted from the observed value and the resultant error is fed into two parallel blocks 
respectively tasked with updating the belief on the state of the world 6, and the belief Ej ([x] on 

the private signals x of other agents. The error a„(j)^ — Ej_t[a„(j) is multiplied by the gain K^{t) and 
the resultant innovation is added to the previous mean estimate to correct the estimate of x [cf. ([36])]. 
Similarly, the error is multiplied by the gain and the resultant innovation is added to the previous 

mean estimate to correct the estimate of at z [cf. ([37])]. In order to determine the equilibrium play as 



per (]3), agent i multiples his private signal estimate Ej ^[x] by the vector Vj(i) obtained by solving the 



system of linear equations in ( [T5[ ). 

Observe that in the QNG filter, we do not use the fact that estimates Ej^t [9] and E,;^j[x] as well as 



actions ^ can be written as linear combinations of the private signals [cf. ( [12] ) and ([20[)]. While the 
expressions in ( [T2] ) and (20) are certainly correct, they cannot be used for implementation because x is 
only partially unknown to agent i. The role of ( [12] ) and ( [20] ) is to allow derivation of recursions that we 
use to keep track of the gains used in the QNG filter. These recursions can be divided into a group of 
LMMSE updates and a group of game updates as we show in Fig. [2] 

As it follows from ([22])-((23]) and ([27])-((29j), the update of LMMSE coefficients is identical to the gain 
and covariance updates of a sequential LMMSE. The only peculiarity is that the observation matrix Hj^t 
is fed from the game update block and is partially determined by the LMMSE gains and covariances of 
previous iterations. Nevertheless, this peculiarity is more associated with the game block than with the 



LMMSE block. The game block uses ( [25) and ([26]) to keep track of the matrices Lj t and the vectors 
kj The matrices Lj t are used as building blocks of the matrix Lt and the vectors kj ^ are stacked in 



the vector kj and used to formulate the systems of equations in ( [15] ). Solving this system of equations. 
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using ^ when it is full rank or its pseudo inverse when it is not, yields the coefficients Vj j which 



in turn determine the observation matrix Hj^t as per ( [2T) . As mentioned before, the game block feeds 
the matrices Hj^t to the filter block as they are used in the LMMSE gains and covariance updates. The 
LMMSE block feeds the gains K^{t) and Kg{t) to the game block as these are needed to update Ljf 

and hj^f 

We remark that agent i is keeping track of the matrices and vectors in Fig. |2] for all i ^ V. I.e., agent 
i calculates observation matrices Hj^t for j £ V in the game block which are fed into the LMMSE block 
to obtain gains matrices K^{t) and Kg{t) for all j G V. These gains are fed into the game block from the 
LMMSE block as they are needed to update Lj t and kj j for all j G V. The reason for this is the step in 
the game block in which we compute the play coefficients Vj t- To solve this system of equations, agent 
i needs to build the matrix Lt that is formed by the blocks Lj ^ of all agents. All of these computations 
for the coefficients of other agents are internal to agent i and independent of the game realization. The 
gains can be computed offline prior to running the game. 

Remark 4 The QNG filter can also be used in repeated games with purely informational externalities. In 
this case each agent's payoff is given by u{0, ai) = —{6 — Oj)^, and the problem is thus equivalent to the 
distributed estimation of the world state 9 ||4|. Our model subsumes the games with purely informational 
externalities as a special case. Given this payoff function, the best response of agent i at time t is the 



action ai{t) = £^ ^[0]. Hence, it is not necessary to solve (15 1 for the optimal strategy coefficients Vj j. 
Other than this the QNG filter remains unchanged. Since in the case of purely informational externalities 
the end goal is the estimation of 9, the QNG filter is tantamount to an optimal distributed implementation 
of a Kalman filter. 



V. Vector states and vector observations 

Consider the case when state of the world is a vector, that is, 6 G for m > 1. Similar to the scalar 
case, each agent receives initial private signal Xj G M"*, 

^i = e + ei (38) 

where the additive noise term G M™ is multivariate Gaussian with zero mean and variance-covariance 
matrix Ci G M™^™. For future reference, define the vector obtained by stacking elements at the fcth 
row and /th column of variance-covariance matrices of all agents, ^ := [Ci[fc, /],..., C^lk, l]]^ . We 
use Xi[n] to denote the nth private signal of agent i where n < m. We assume that private signals are 

February 4, 2013 DRAFT 



17 



Game coefficients 



LMMSE coefficients 



Vaiiable 


Update 




Variable 


Update 








KUt) = Ml^{t)H,,t {HltMl^{t)Hj,t) j22) 






^It+i = ^It + it) {hi, - HJ.L,,,) {26} 


Kiit) 


H (*) = M'sJt)H,^t {Hf ,MUt)Hj,t) S 


Hj,t 




Ltvt = 5kt (15) 




Mi^{t + 1) = ML{t) - Ki{t)Hj,Mi^(t) j27) 




(21} 




M',Jt + l) = M'gjt) - K^,{t)Hf,MUt) ^ 





v,.t Hi,t 
to QNG filter to QNG filter 



to QNG filter to QNG filter 



Fig. 2. Propagation of gains required to implement the Quadratic Network Game (QNG) filter of Fig. ^ Gains are separated 
into interacting LMMSE and game blocks. All agents perform a full network simulation in which they compute the gains of 
all other agents. This is necessary because when we compute the play coefficients Vj,t in the game block, agent i builds the 
matrix Lt that is formed by the blocks Lj^t of all agents [cf. l|14|l]. This full network simulation is possible because the network 
topology and private signal models are common knowledge. 



independent among agents, that is, Ej^o[ejej] = for all i E F and j ^ V \ {i}. We define the set of all 
private signals as 



X := [Xi[l], . . . ,XAr[l], . . . ,xi[m], . . . ,X7vH]' 



(39) 



where x G M^'"^^. We use x[n] := [xi[n], . . . ,X7v[n]]-^ to denote the vector of private signals of agents 
on the nth state of the world. 

At each stage t, agent i takes action aj(t) G R™. Agent i's action at time t is to maximize a payoff 
function which is represented by the following quadratic function 

Ui{ai, {aj}j^v\i, ^) = -\Y1 ^J^J + ^ID9, (40) 

jev jev\{i} 

where constants Bij and D belong to M'"^'". Similar to the scalar case, other additive terms that depend 
on {st.j}j(zv\i and 6 can exist without changing the results to follow. We obtain the best response function 
for agent i by taking the derivative of the expected utility function with respect to a^, equating it to zero, 
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and solving for aj: 

BRi,t{Wj,t{hj,t)}jeV\d = Z]^iiEi,t[aj-t(/ij-t)] + L'Ei,t[0]. (41) 

Note that BR* : M^'" ^ M^™. 

Similar to the case when the unknown parameter is a scalar, it is sufficient for agents to keep track 
of estimates of x in order to achieve the best estimate of 6. Accordingly, the definitions of estimates of 
private signals and the unknown parameters and their corresponding covariance matrices ([9l)-([TT]) are the 
same as in the scalar case. 

In what follows, we show that the mean estimates are linear in private signals and equilibrium actions 
are linear in expectations of private signals in the similar fashion we did for the scalar state of the world. 



Lemma 3 Consider a Bayesian game with quadratic utility as in (40 1. Suppose that for all agents i, the 
joint posterior beliefs on the state of the world 9 and the private signals x given the local history hi^t 
at time t, Pj X"^]), are Gaussian with means expressed as 

Ej,t [6] = Qi^t^, and Ej,i[x] = Li^tx, (42) 

where Li^t G and Qi^t G M"*^^™- are known estimation weights. If there exists an equilibrium 

strategy profile that is linear in expectations of private signals, 

<t{hi,t) = U^,tB^,t[^] foralli^V, (43) 
then the action coefficients {Ui^t}i£V can be obtained by solving the system of linear equations 

LltUlt = ^ItLltUj^tBlj + QltD"^, for all i G V (44) 

jev\i 

Proof: The proof is analogous to the proof of Lemma [T] By substituting the candidate strategies in 
([43]) to the best response function in (|4T]) for all i G F, we obtain the following equilibrium equations 



Ui,tEi4x]= BijE,4Uj,tEj,t[x]] + DEi49]. (45) 

jev\{i} 



for all i G y. After using the fact that Ei t[Ej f[x]] = Lj fEj t[x] with mean estimate assumptions in (42 1 
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for the corresponding terms in (45 1, we obtain the following set of equations 



Ui^tLi^t^ = ^ BijUj^tLj^tLi^t^ + DQi^t^. (46) 

jev\{i} 



We ensure that the strategies in ( [43] ) satisfy the equilibrium equations for any realization of history by 
equating coefficients that multiply each component of x in (|46]l which yields the set of equations given 
by (|44]). ■ 
For a linear equilibrium strategy, the actions can be written as a linear combination of the private 
signals using ( |42] l, that is, the action of agent i at time t is given by 

a^{t) = Ui^tLi^tx for all i G V. (47) 



Being able to express actions as in ( [47] ) permits writing observations of agents in linear form. From the 
perspective of an observer, the action aj{t) is equivalent to observing a linear combination of private 
signals. As a result, we can represent observation vector of agent i a„(j)(t) := [ajj(t), . . . ,aj^(^j(t)]^ G 
]gr?id(«) linear form as 

a„(i)(t) = i^i^tx = [Uj,,tLj,,t; . . . ; f^j.(.,,t^i.(,),t]x (48) 



where HJ^^ = [Uj^,tLj^,t; • • • ; Uj^i.^^,tLj^^.^^t] G ]^™-'i(«)x^'n- is the observation matrix of agent i. 

Agent i's belief of x at time t is normal, and at time t + 1 agent i observes a linear combination of 
X. Hence, agent z's belief at time t + 1 can be obtained by a sequential LMMSE update. As a result, 
mean estimates remain weighted sums of private signals as in (|42]). In the following lemma, we explicitly 
present the way we compute the estimation weights, Li^t+i and Qi^t+i, at time t + 1 when 6 £ M™. 



Lemma 4 Consider a Bayesian game with quadratic function as in ( 40 1 and the same assumptions and 
definitions of Lemma |J] Further define the gain matrices as 

Kit) := MUt)H^,t{HJ;tMUt)H^,ty\ (49) 

Km ■■= MUt)Hi,t{HT^,Mum,ty'- (50) 

If agents play according to a linear equilibrium strategy then agent i's posterior Pj^f+i([0^, x^]) is 
Gaussian with means that are linear combination of private signals, 

Ei,t+i [6] = and Ej,t+i[x] = Lj^t+ix, (51) 
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where the estimation matrices are given by 



Qi,t+i = Qi,t + Kg{t) [hJ^ - H^, Li^t) 



(52) 
(53) 



and the covariance matrices are further given by 



MUt + 1) =MUt) - Kmf.MUt), 

M^,(t + 1) =Mle{t) - [Ki{tfHl^,M'^eit)f, 
Ml^{t + 1) =Ml^{t) - K'g{t)HlMUt). 



(54) 
(55) 
(56) 



Proof: The proof is identical to the proof of Lemma |2] with the action coefficients Ui^t taking the 
place of Vj J. ■ 
Lemma |4] shows that when mean estimates are linear combinations of private signals at time t, they 



remain that way at time t + 1. In the next theorem, we show that assumption in ( |42| ) is indeed true 
for all time by realizing that the estimates at time t = are linear combinations of private signals. 
To simplify presentation of initial conditions, we assume that agent f's private signals are independent, 
Ej^o[xj[fc]xi[^]] = for all A; = 1, . . . , m and I ^ k. 



Theorem 2 Given the quadratic utility function in ( |40| ), if there exists a linear equilibrium strategy al 
as in (|43[) for t G N, then the action coefficients Ui^t can be computed by solving the system of linear 



equations in (44 1, and further, agents' estimates of :x. and are linear combinations of private signals as 



in (42 1 with estimation matrices computed recursively using (49l-(50l and (52 1-(|56[) with initial values 

\ 



Q 



/ ef OixN 
OixN ej 



i,0 



Oi, 
Oix 



nmxNm, 



Voixiv ... Oixjv ef / 

L,o :=diag([lef,...,lef])GM^-x^™ 



(57) 



(58) 



where e, G 



The initial covariance matrix M^x(O) G 



iNmxNm 



is a diagonal block matrix with 



N X N blocks {{Mi^))k,k G 



pNxN 



for k = 1, . . . ,m , initial variance M^g (0) G 



and initial 
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Algorithm 1 QNG filter for 9eR'^ 
Initialization: Set posterior distribution on 6 and x 



and {Ljfl,'kjfi}j^v according to ( pT] ) and ( pSj ). 
For t = 0,1,2, ... 



1) Equilibrium strategy: Solve for {Uj^t}jev using the set of equations in (|44]). 

2) P/aj and observe: Take action aj(t) = [/j.tEj_t[x] and observe a„(j)(t). 

3) Observation matrix: Construct Hi t using 



4) Bayesian estimates: Update Ei f[x] and Ei_4[0] using pU\} and (jSTj), respectively. Update error 
covariance matrices using (|54])-(|56l). 



5) Estimation weights: Update {Lj^t,^j,t}jev using (|52|)-(|53 1. 



cross covariance Mg^{0) G K™^^™ are given by 



{iML))k,k = diag(e,)diag(Cfc,fc) + e,ef k], (59) 
M^0(O) = (60) 



mL(o) = a 



^ ef OixN ... Oi>, 
Oixjv ef ... Oi>, 



(61) 



\OixN ... Oixjv ej 

Proof: See Appendix |B] ■ 
Similar to the scalar case, when network structure and the equilibrium strategy profile are common 
knowledge, agent i can calculate the weights {Uj^t}jeV for all t and update his estimates locally. In 
Algorithm [T] we provide a sequential local algorithm for agent i to calculate updates for and x and to 
act according to equilibrium strategy. The Bayesian rational learning defined here in Algorithm [T] for the 
vector state case follows the same steps for the scalar case defined in Section IV and by Figs. [T] and [2] 

VI. CouRNOT Competition 

In a Coumot competition model N firms produce a common good that they sell in a market with 
limitless demand. The cost per production unit c is common for all firms and constant for all times. The 
selling unit price, however, decreases as the total amount of goods produced by all companies increases. 
We adopt the specific linear model p — J2j£V ^'^^ '■^^ seUing unit price, where p is the constant market 
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Fig. 3. Line, star and ring networks. 



price when no goods are produced. The profit of firm i for production level ai G M.~^ is therefore given 
by the utility 

Ui{ai, {aj}j(zv\i, 0) = -cui + (p - - ^ aj)ai. (62) 

j€V\i 

The utility function in (62i is not of the quadratic form given in Q because there are two information 



externalities, the cost c and the clearing price p. While it is possible to resort to the vector form of the 
QNG filter covered in Section |Vj it is simpler to write ( |62] ) in a form compatible with (|2]) by defining 
the parameter 6 := p — c as the effective unit profit at the market price. Using this definition in ([62]) and 
reordering terms yields 

Ui{ai,{aj}j(zv\i,0) = {6 - ai - ^ aj)ai. (63) 

j&V\i 

Since this utility function is of the form in (|2]), we can use the QNG filter of Section IV as summarized 



in Figs. [T] and |2] to determine subsequent BNE production levels. The explicit form of the equilibrium 
equation in (|8) is 

= Ie^^O] -IY. ^^,tKtihi,t)]. (64) 



2 

jev\i 



It is immediate from (64i that when Ej ([6'] < it is best for firm i to shut down production. To avoid 
boundary conditions we restrict attention to cases where private signals x are such that Ej t[^] > for 
all i G y and t € N. This can be guaranteed if all private signals are nonnegative, i.e., x > 0. In a game 
with complete information all private signals x are known to all agents. In this case the (regular) Nash 
equilibrium actions of all agents coincide and are given by 

. E[e|x] 



for all i G V. (65) 



The numerical simulations in the next section show that the BNE strategies in (64) converge to the 



(regular) Nash equilibrium strategy ( 65 1 in a finite number of steps 
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1.3 




Time Time Time 



(a) (b) (c) 

Fig. 4. Agents' actions over time for tlie Coumot competition game and networks sliown in Fig. [5] Eacii line indicates the 
quantity produced for an individual at each stage. Actions converge to the Nash equilibrium action of the complete information 
game in the number of steps equal to the diameter of the network. 




Time Time Time 



(a) (b) (c) 

Fig. 5. Normed error in estimates of privates signals, ||x — Ei,t[x]||2, for the Coumot competition game and networks shown 
in Fig. |3] Each line corresponds to an agent's normed error in mean estimates of private signals over the time horizon. While 
all of the agents learn the true values of all the private signals in line and ring networks, in the star network only the central 
agent learns all of the private signals. 



A. Learning in Coumot competition 

The underlying effective unit profit is chosen as 6 = 12$/unit. Firms observe private signals with the 
additive noise term coming from standard normal distribution, i.e., ~ A/'(0, 1). Given this setting, we 
consider three benchmark networks: a line network with N = 5 firms, a star network with = 5 firms, 
and a ring network with = 10 firms (see Fig. [3]). 

The quantities produced by firms over time are shown in Fig. |4] for the line (a), star (b) and ring (c) 
networks. In all of the cases, we observe consensus in the units produced. Furthermore, the consensus 
production a* is optimal; that is, firms converge to the Bayes-Nash equilibrium under complete informa- 
tion ( [65] ). This implies that all of the firms learn the best estimate of 6 by the convergence time T, that 
is, Ei^T[0 I hi^r] = E[6i | x] for all i e V. 

Figs. [5|a)-(c) show the error in estimation of private signals ||x — Ej^([x]||2 for alH G F and t G N. 
In Figs. [5ja) and [5jc), corresponding to line and ring networks, the mean square error in private signal 
estimates goes to zero for all of the firms at the end of the convergence time T. On the other hand, in the 
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Fig. 6. Mobile agents in a 3-dimensional coordination game. Agents observe initial noisy private signals on heading and 
take-off angles. Agents revise their estimates on true heading and take-off angles and coordinate their movement angles with 
each other through local observations. 

Star network in Fig. |5jb), except for the center firm 5, none of the other firms has zero mean square error 
in private signal estimates. This means that these firms do not learn at least one of the private signals. 
As we know from Fig. |4] (b), all of the firms in the star network learn the best estimate of given all 
of the private signals. Hence, in the star network, firms only learn the sufficient statistic to estimate 9 
(which is the average of the private signals) rather than learning each of the private signals individually. 

Figs. |4ja)-(c) suggest that convergence is achieved in 0(A) steps where A is the diameter of the 
graph. In [4|, it is argued that for the distributed estimation problems when the individual utility function 
is equal to Ui{a4,6) = — (aj — 6)^, convergence happens in 0(A) steps for tree networks. Our results 
show that the convergence rate is 0(A) not only for tree networks such as line and star networks but 
also for the ring network when the utility function is quadratic and includes actions of others. 

VII. Coordination Game 

A network of autonomous agents want to align themselves so that they move toward a goal (x* ,y*,z*) 
on 3-dimensional space following a straight path, and at the same time maintain their initial starting 
formation. When the goal {x* ,y*, z*) is far away, then there exists a common correct direction of 
movement toward the goal characterized by the heading angle on the x — y plane (j) £ [0°,180°] and 
the take-off angle on the x — z plane t/j € [0°, 180°]. Hence, the target movement direction is given by 
6 = [i;^, -0]"^- Fig. [5] illustrates a set of autonomous agents on a 3-dimensional plane and their heading 
and take-off angles where the x, y, z axes are depicted for agent 1. 

Mobile agents have the goal of maintaining the starting formation while moving at equal speed by 
coordinating their movement direction with other agents. Agents need to coordinate with the entire 
population while communication is restricted to neighboring agents whose direction of movement they 
can observe. In this context, agent z's decision G [0, 180°] x [0, 180°] represents the heading and 
take-off angles in the direction of movement. The estimation and coordination goals of agent i can be 
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represented with the following payoff 

a,- - fl^'^Ca,- - m - 

2{N 



Ui{ai, {aj}j^v\i, = -^-^{ai - ef{ai -6)- ^(ai - ajfi^i - aj). (66) 

kv\{i} 



The first term is the estimation error in the true heading and take-off angles. The second term is the 
coordination component that measures the discrepancy between the direction of movement and those of 
other agents. A is a constant in (0, 1) gauging the importance of estimation term with respect to the 
coordination term. 



The same payoff formulation can be motivated by looking at learning in organizations |30|. In an 
organization, individuals share a set of common tasks and have the incentive to coordinate with other 
units. Each individual receives a private piece of information about the task that needs to be performed 
while only being able to share his information with whom he has a direct contact in the organization. 



Note that the utility function is of the quadratic form given in ( |40| ) with vector states and vector actions. 
Hence, we can use the QNG filter in Section |V] as summarized in Algorithm [T] As postulated in ([8]l, the 
explicit equilibrium equation for all i € V is 

^Uh,,t} = (1 - A)E,,t[0] + ^^,tWltihJM■ (67) 

jev\{i} 

In a game with complete information, the Bayes-Nash equilibrium actions of all agents coincide and 
are given by 

o* = E[6>|x]. (68) 



In the next section, we show that the equilibrium actions in (67 1 converge to the Bayes-Nash equilibrium 



with complete information as given by ( [68] ) in finite number of steps. 

A. Learning in coordination games 

The correct direction vector is chosen to be = [10°, 20°]^. We let A = 0.5. The noise terms, are 
jointly Gaussian with mean zero and covariance matrix equal to the identity matrix. Having an identity 
covariance matrix implies that E[xj[l]xj[2]] = 0. 

We evaluate equilibrium behavior in geometric and random networks with = 50 agents. Figs. [7] (a) 
and (b), respectively. Geometric random network is created by placing the agents randomly on a 4 meter 
X 4 meter square and connecting pairs with distance less than 1 meter between them. In the random 
network, any pair of agents are neighbors with probability 0. 1 . The geometric network in Fig. [7] (a) has 
a diameter of = 5 where the random network in Fig. [7] (b) has a diameter of = 4. 
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1 2 3 4 1 2 3 

meters melers 



(a) (b) 

Fig. 7. Geometric (a) and random (b) networks with = 50 agents. Agents are randomly placed on a 4 meter x 4 meter 
square. There exists an edge between any pair of agents with distance less than 1 meter apart in the geometric network. In the 
random network, the connection probability between any pair of agents is independent and equal to 0.1. 

The direction of movement of each agent over time is depicted in Figs. [8ja)-(d). Figs. [8f a) and [8jb) 
show the heading angle of agents in geometric and random networks, respectively. Figs. [8]^c) and[8jd) 
show the take-off angle Tpi of agents in geometric and random networks, respectively. Fig. [8] illustrates 
that agents' movement directions converge to the best estimates in heading and take-off angles in a finite 
number of steps. As a result, at the end of the convergence time T, we have Ej^([(/) | /ij^y] = E[(/) | x[l]] 
and Ej | /ii.r] = E[V' | x[2]] for all i € V. Further, convergence time is in the order of the diameter 
for both of the networks. This means that agents learn the sufficient statistic to calculate best estimates 
in the amount of time it takes for information to propagate through the network. 

VIII. Conclusion 

In this paper we introduced the QNG filter that agents can run locally to update their beliefs and select 
equilibrium actions actions in repeated quadratic games with both information and payoff externalities. 
The QNG filter provides a mechanism to update beliefs in a B ayes' way when agents' initial prior over 
the state of the world is Gaussian. We began by showing that when the prior estimates of private signals 
are Gaussian with means equal to a linear combination of private signals, and the equilibrium strategies 
of agents are linear combination of mean estimates of private signals, Bayesian updates of estimates 
of private signals and the underlying state follow a sequential LMMSE estimator. This meant that the 
estimates remain linear combinations of private signals, and hence, Gaussian. By induction, estimates 
remain Gaussian for all times if equilibrium actions that are linear in mean of the estimates exist at 
all the stages. Further, we derived an explicit recursion for tracking of estimates of private signals and 
calculating equilibrium actions which we leverage to develop the QNG filter. We then extended the QNG 
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Fig. 8. Agents' actions over time for the coordination game and networks shown in Fig.|7] Values of agents' actions over time 
for heading angle ifn (top) and take-off angle tpi in geometric (left) and random (right) networks respectively. Action consensus 
happens in the order of the diameter of the corresponding networks. 



filter to the case when the state of the world is a vector. We exemplified the QNG filter in Cournot 
competition game and coordination of mobile agents on 3-dimensional space. In the former the state of 
the world, effective profit, was a scalar, whereas in the latter the state of the world was a vector including 
heading and take-off angles. In both examples, the QNG filter converged to the BNE of the game in 
number of steps that is equal to the order of the diameter of the network. This meant that agents learnt 
the sufficient statistic of the state while not necessarily learning all the individual private signals. 

Appendix A 
Proof of Theorem [T} 



At time t = beliefs are normal and have the form in ( [T2| ). Indeed, since the only information available 
to agent i at time t = is the private signal Xi it follows from the linear observation model in ([T]) that 
this is the value assigned to the estimate of all private signals as well as to the estimate of the state 9, 

Ei,o [xj] = Xi for all j, Ei,o [9] = Xi. (69) 
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The elements of the matrix Lj o = lef are 1 in the ith column and otherwise. Therefore, the first 



expression in (69) is equivalent to the first expression in (34i. Likewise, since the ith element of is 



one with remaining elements zero, the second expression in (69 1 is equivalent to the second expression in 



( [34| ). As for the variances in ( [351 ), note that the initial estimate of x has error covariance matrix defined as 
in ([9]) for t = 0. By substituting initial mean estimates inside (|9]) and then using the fact that ejx. = xi, 
the error covariance matrix can be rewritten as 



X - IXi) (x - IXi 



(70) 



From (70 1, we get the following by using the fact that Xj -Xi = ej - et by ^, 



Mi,(0) =Ei,o {e-lei){e-l€iy 



(71) 



When we expand the terms in (71 ), we obtain the following 



=diag(c) - eil'^a - leja + II'^q 
=diag(c) + eiefa - e^ef c. 



(72) 
(73) 
(74) 



Since private signals are independent among agents, that is Ej^oiefcej] = for all j £ V\k and k £ V, 
we have Ej o[ee^] = diag(c), Ej o[eej] = ejQ. Using these relations and the definition of noise variance 
Ci = E[e^l, ([731) follows from (1721. When second and third terms are subtracted from the fourth term in 



( |73l ), we obtain the last two terms in ( |74l ). Now, observe that diag(c) — e^e/ a = diag(ei)diag(c), hence 



(74) can be rewritten as in (35). 



Consider the variance of 6 defined in ( [TOl ) at time t = 0. Substituting Ej o[^] = Xi inside ( [TOl ), we have 



(75) 



By the signal structure ([TJ with additive zero mean Gaussian term e^, we have — Xj = — e^. As a result, 
^eei^) ~ Et,o[e?] which is in return equal to Cj. Next consider the cross-covariance between 6 and x 



defined in ( 1 1 ) at time t = 0, 



M^O) =E,,o (e - Eifl [9] ) (x - E,,o [x] 



=E 



i,0 



(-e,)(6-l6.)^ 



(76) 
(77) 
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The second equality follows by substitution of initial mean estimates and then using the definition of 
private signals ([1]). Next, we multiply out the terms in ( fTTj ), use independence of private signals between 
agents to get 



The inductive hypotheses is then true at time t = with the explicit initializations in p4] ) and ([35 1. 
Lemma [2] has already shown that if the inductive hypothesis is true at time t, it is also true at time t + 1. 
It also provided the explicit recursions in (22 1-(|23]) and (|25])-(29l. Lemma [T] further shows that the action 



coefficients Vj ^ can be computed by solving the system of linear equations in ( 15 1. 



Appendix B 
Proof of Theorem[2J 



At time t = 0, agents beliefs are normal and have the form in (42 1. Since the only information available 



to agent i at time t = is the private signal Xj, it follows from the observation model in (38 1 that agent i 



assigns x, as his mean estimates of the underlying parameter vector and the private signals as in ([57|)-(|58]). 
Next, consider the initial error covariance matrix M^-^{0), 



M^,(0) = Ei,o (x - Ei,o[x]) (x - Ei,o[x]) 

/ x[l]-lx,[l] \ 



/ x[l]-lx,[l] \ 
\ x[iV]-lxi[iV] ) 



\ x[iV]-lx,[iV] / 



(78) 



(79) 



Substituting initial mean estimates (58 1 in (78 1 and using the fact that le?^x[n] = lxj[n], we get ( |79] ). 
Let e[n] := [ei[n], . . . , eAr[n]]^ G denote the noise values of agents on the nth state of the world. 



then we can write each N x N block of the matrix obtained in (79 1 as follows 



E 



i,0 



{€[k] - l€,[k]) ie[l] - lei 



(80) 



Since initial private signals of agent i are assumed to be independent of each other, that is, Ej^o[ej[^]ei[^]] = 
for all A; = 1, . . . , m and / 7^ A;, (80 1 is zero when k ^ I. When A; = /, (8O1 is equivalent to ( fTT] ). As 



a result, for the N x N blocks at the diagonals of M^^{0), we obtain (59 1 which is similar to its scalar 



counterpart given in ( |35] ). Consider the variance of 6 at time t = 0. Using ( pT) , we obtain that Mgg(O) 



is as given in (60 1. The initial cross covariance can also be calculated using initial mean estimates in 



(57 1 and (58 1 in a similar way. 
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Given the normal prior Pj o([^^, x^]) with mean estimates given by (|57|)-(58 1, the inductive hypothesis 
in Lemma [3] is satisfied at time t = 0. Further, by our assumption there exists a linear equilibrium action 
with weights C/j.o that can be calculated by solving the set of equations in (|44]). Lemma |4] already provides 
a way to propagate beliefs when agents play according to linear equilibrium strategy. Furthermore, by 
Lemma [4j if the inductive hypothesis is true at time t then it is also true at time t + I. 
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